“Workshop:ggplot2: from scratch to compelling graphs”
the simple way of plotting in R plot(price ~ carat, data=diamonds) hist(diamonds$price) boxplot(diamonds$price)
using ggplot2 with R studio’s built in diamonds dataset
## carat cut color clarity depth table price x y z
## 1 0.23 Ideal E SI2 61.5 55 326 3.95 3.98 2.43
## 2 0.21 Premium E SI1 59.8 61 326 3.89 3.84 2.31
## 3 0.23 Good E VS1 56.9 65 327 4.05 4.07 2.31
## 4 0.29 Premium I VS2 62.4 58 334 4.20 4.23 2.63
## 5 0.31 Good J SI2 63.3 58 335 4.34 4.35 2.75
## 6 0.24 Very Good J VVS2 62.8 57 336 3.94 3.96 2.48
aes()map the diamond’s cut value to the color aesthetic aes(color=cut)
ggplot(diamonds, aes(x=carat, y=price)) + geom_point(aes(color=cut))
or hard code the color variable color='blue'
ggplot(diamonds, aes(x=carat, y=price)) + geom_point(color='blue')
geom_histagram automatically calcualtes heights. No Y value necessary change resolution with the aesthetics binwidth argument
ggplot(diamonds) + geom_histogram(aes(x=price), binwidth = 100)
color property sets border on 2d objects, fill sets fill color `fill=‘red’, color=‘blue’
ggplot(diamonds) + geom_histogram(aes(x=price), fill='red', color='blue')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
when you specify aes() inside ggplot function, it applies to all layers. in this case we have color assigned to the diamond’s cut property in ggplot’s aes() function like so… ggplot(diamonds, aes(x=carat, y=price, color=cut))
ggplot(diamonds, aes(x=carat, y=price, color=cut)) +
geom_point(aes(color=cut)) +
geom_smooth()
while in this case, color is assigned to cut, but only in the geom_point() function’s aesthetic.
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point(aes(color=cut)) +
geom_smooth()
transparency is achieved here by setting alpha=1/3 in geom_point()
ggplot(diamonds, aes(x=carat, y=price)) +
geom_point(aes(color=cut), shape=1, size=2, alpha=1/3) +
geom_smooth()
assign base function to a variable!
g <- ggplot(diamonds, aes(x=carat, y=price, color=cut))
g + geom_point() + facet_wrap( ~ cut)
1d row or col of facets
g + geom_point() + facet_wrap( ~ cut, nrow=1, scales='free') #also try nrow=1
2d grid of facets
g + geom_point() + facet_grid(color ~ clarity)
scales=‘free’ argument allows the scale of the axis of individual facets to vary scales=‘free_y’ allows free y axis and fixed x axis
using free scales is totally misleading, at a glance suggests all cuts of diamonds are the same price
research forumla notation (use of ~ )….
compare the 2 following examples
ggplot(diamonds, aes(x=price)) + geom_histogram() +
facet_wrap(~cut)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(diamonds, aes(x=price)) + geom_histogram() +
facet_wrap(~cut, scales = 'free')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(diamonds, aes(x=1, y=price)) + geom_boxplot()
ggplot(diamonds, aes(x=cut, y=price)) + geom_boxplot()
..vs ###violins (more informative!)
ggplot(diamonds, aes(x=cut, y=price)) + geom_violin()
more on violin plots…
when we add geom_point, the x axis is discrete, so not very helpful
ggplot(diamonds, aes(x=cut, y=price)) + geom_violin() +
geom_point()
usingjittering will make the points more useful to show their density. also, the order matters! put the violin on top layer (last) to make it more useful
ggplot(diamonds, aes(x=cut, y=price)) +
geom_jitter(alpha=1/4) +
geom_violin(alpha=1/2, draw_quantiles = .5)
g1 <- ggplot(diamonds, aes(x=carat, y=price)) +
geom_point(aes(color=cut))
g1 + theme_economist()
g1 + theme_fivethirtyeight()
g1 + theme_wsj() + scale_color_wsj()
g1 + theme_bw()
g2 <- ggplot(diamonds, aes(x=carat, y=price)) +
geom_point()
g2 + labs(x='Carat', y='Price ($)', title='Price by Carat')
alternatively
g2 + xlab('Carat') + ylab('Price ($)') + ggtitle('Price by Carat')
add dollar sign to labels
g2 + labs(x='Carat', y='Price', title='Price by Carat') +
scale_y_continuous(label=scales::dollar)
comma delimit numbers note the :: syntax allows you to use function from package you haven’t loaded
g2 + labs(x='Carat', y='Price ($)', title='Price by Carat') +
scale_y_continuous(label=scales::comma)
brewer is a color scale for color blind (i think) (see other scale_color_ functions)
g1 + scale_color_brewer()
move the legend to the bottom
g1 + theme(legend.position='bottom')
xlim zooms by removing data. this is problematic because the smoothing curve no longer has certain values, thus the output is distorted
g3 <- g2 + geom_smooth()
g3 + xlim(c(0, 3))
## Warning: Removed 32 rows containing non-finite values (stat_smooth).
## Warning: Removed 32 rows containing missing values (geom_point).
so how can we just zoom?
g3 + coord_cartesian(xlim=c(1, 3))
rotate 90 degrees
g3 + coord_flip()
polar coordinates
g3 + coord_polar()
an ugly heatmap
library(scales)
library(tidyr)
library(reshape2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# economic indicators dataset
head(economics)
## Source: local data frame [6 x 6]
##
## date pce pop psavert uempmed unemploy
## (date) (dbl) (int) (dbl) (dbl) (int)
## 1 1967-07-01 507.4 198712 12.5 4.5 2944
## 2 1967-08-01 510.5 198911 12.5 4.7 2945
## 3 1967-09-01 516.3 199113 11.7 4.6 2958
## 4 1967-10-01 512.9 199311 12.5 4.9 3143
## 5 1967-11-01 518.1 199498 12.5 4.7 3066
## 6 1967-12-01 525.8 199657 12.1 4.8 3018
#correlation matrix between these specific variables
econCor <- cor(economics[, c(2, 4:6)]) #this is in "y format"
# todo learn about piping in R %<% (cmd+shift+ m)
econMelt <- melt(econCor, varnames=c('x', 'y'), value.name='Correlation')
head(econMelt)
## x y Correlation
## 1 pce pce 1.0000000
## 2 psavert pce -0.8370690
## 3 uempmed pce 0.7273492
## 4 unemploy pce 0.6139997
## 5 pce psavert -0.8370690
## 6 psavert psavert 1.0000000
econMelt <- econMelt %>% arrange(Correlation)
head(econMelt)
## x y Correlation
## 1 psavert pce -0.8370690
## 2 pce psavert -0.8370690
## 3 uempmed psavert -0.3874159
## 4 psavert uempmed -0.3874159
## 5 unemploy psavert -0.3540073
## 6 psavert unemploy -0.3540073
# set heatmap to h
h <- ggplot(econMelt, aes(x=x, y=y)) + geom_tile(aes(fill=Correlation))
#now style the heatmap
h + scale_fill_gradient2(low=muted('red'), mid='white', high='steelblue',
guide=guide_colorbar(ticks=FALSE, barheight=10), limits=c(-1, 1)) +
theme_minimal() + labs(x=NULL, y=NULL)